We needed to run a tree test. So we built the tool
How a student IA study became a free research pipeline for everyone.
✦
See the whole story, not just a score
Each stage produces a different view of how people actually think about your content. Stack them, and a hunch — "this navigation might work" — becomes an answer with receipts.
01
Card sort
How do people actually group your content?
→
02
Site map
AI proposes a structure in participant language.
→
03
Tree test
Can people find things in that structure?
→
04
Refine
AI suggests fixes. You re-test.
01 · Card sort analysis
What participants think belongs together
Before you design the menu, ask your users how they'd organize it. Three views turn raw sort data into the shape of their mental model.
Similarity matrix
Percentage of participants who placed each pair in the same group.
The darker blocks along the diagonal are the natural clusters hiding in your content — items that belong together whether you planned it or not.
Category label cloud
The group names participants typed, sized by how often they appeared.
Gives you the exact words users reach for — often different from the jargon your team defaulted to. Copy these into your nav labels.
Item confusion table
Each item's top group, second-choice group, and how many groups it ended up in.
Flags orphans — items that never found a consistent home. These are the navigation risks to resolve before a tree test.
02 · AI site map
Sort data becomes an editable tree
✦ AI Insights
Clustered 43 items into 5 top-level groups. Flagged "Comic Books" — split across 3 sorts — suggested elevating to its own branch.
Card sort clusters
Books & Paper12
Collectibles8
Toys & Games14
Home & Kitchen9
✦
AI
Proposed site map
Clusters feed directly into a proposed tree. You review, edit, and approve — then AI writes tasks with every correct path verified against the tree.
03 · Tree test dashboard
Did people find what they were looking for?
Six linked views — each answering a different question about the same responses. Together they separate "wrong answer" from "right answer, wrong path" from "right path, felt terrible."
Success rate by task
Direct success, indirect success, and failures stacked per task.
The headline chart stakeholders recognize. Anything under 70% is worth investigating — the color split tells you which fix to try first.
First-click heatmap
Where each participant's first click landed on every task. Green outlines mark the correct top-level branch.
First click is the single strongest predictor of task success. If people don't choose right on click one, it rarely recovers.
Path dendrogram
Every route participants actually took through the tree, with branch thickness weighted by traffic.
Shows the wrong turns, not just the wrong answers — where people got close but derailed. Often more actionable than a success percentage.
Success × time scatter
Each task plotted by median completion time (x) and success rate (y).
Top-left is intuitive, bottom-right is structural issues — different failure modes with different fixes.
Time distribution
Min, quartiles, median, and max completion time per task.
Averages hide outliers. A task with huge variance means some people breezed through and some got stuck — often worth more than the mean.
SEQ ease scores
Single-Ease-Question self-rating (1–7) averaged per task.
Behavior tells you what they did; SEQ tells you how it felt. Tasks that succeed but feel hard are fragile — small friction, big downstream cost.
04 · AI refinement loop
From data to a decision you can act on
A dashboard on its own doesn't tell you what to change. The final step reads the data and writes the edits — so one round reliably becomes two.
↻ Round 2 → Round 3
Success rate up +24% after applying the suggested edits. Tasks T3 & T4 moved red → green.
Problem areas
HighMath Text Book33%
HighDiecast Car17%
MedVintage Camera8%
✦
AI
Suggested edits
↗Elevate "Books" to top-level category
↗Move "Diecast" into Vehicles & Models
↗Relabel "Antiques" → "Vintage & Antiques"
The AI reads the dashboard and writes plain-English edits with severity. Approve, re-run, and the iteration counter tracks success-rate lift across rounds.
Card sort to validated navigation in one pipeline
Every step of the research process — from raw content items to a tested site map — runs inside one tool. AI handles the overhead. You handle the research.
Card Sort
Upload your content, pick open, closed, or hybrid, and share one link. Participants sort in their browser — no software, no account, no install.
AI Site Map
AI reads your sort data and proposes a tree in participants' own language. You review, edit, and approve before the test runs — the AI doesn't ship anything without you.
Tree Test
AI writes realistic tasks with paths verified against your tree, so you never ship a test that asks for something the sitemap can't answer. Collect unmoderated responses and read the full dashboard.
Rich Dashboards
Similarity matrices, first-click heatmaps, confusion tables, path dendrograms, SEQ scores — every view the paid platforms give you, and none of the paywalls.
AI Insights
AI reads the full dashboard and writes a plain-English summary: what's failing, what's working, and what to change. Each recommendation ships with a severity badge — no jargon, no guessing which chart to weight.
Iterate & Improve
Apply the suggested edits, re-run, compare rounds. The iteration counter tracks success-rate lift between tests — so you can show, not claim, that the navigation got better.
Why we built this
Rigorous IA research shouldn't depend on your budget
Every researcher deserves infrastructure that matches their methodology, not their institution's software budget. These tools provide data granularity, but sit behind paywalls that researchers at student budgets can't reach.
01 · Budget
Free, not a trial
Maze starts at $99/mo. Optimal Workshop runs $199/mo. UserTesting climbs to $40k/yr. TreeTest AI runs in your browser on your own API key — Gemini's free tier (1,500 requests/day, no credit card) is more than enough for a semester of research.
02 · Repeatability
Built to run more than once
Paid tools charge per study, so student and indie projects usually end up with one "hope it worked" round. TreeTest AI is built around the iteration loop: apply AI suggestions, re-run, compare rounds. The point isn't one clean report — it's reaching a navigation that actually works.
03 · Accessibility
For designers, not methodology PhDs
AI helps you do the parts paid tools assume you already know — writing unbiased tasks, reading SEQ scores, interpreting first-click heatmaps. You stay on the decisions; the AI handles the vocabulary.
API key
Unlock AI insights with your own key
AI features run through your API key, not ours. No shared quota, no queue, no credit card on our side — you keep direct control of usage and cost.
01 · What it is
A personal key, pasted once
An API key is a personal token that lets TreeTest AI talk to an AI model on your behalf. Paste it in settings once and every AI feature — sitemap clustering, task writing, dashboard insights — routes through your own account.
02 · Recommended✦ Best fit
Claude, by Anthropic
We've had the most reliable results with Anthropic's Claude. Task writing is sharper, sitemap suggestions stay closer to the actual data, and dashboard summaries land with less editing. Gemini's free tier also works and costs nothing.
03 · What it costs
Under $1 for 50+ runs
Negligible. All the testing we did to build this platform cost less than a dollar total, across 50+ AI runs on Claude. A full research cycle — clustering, tasks, insights — lands in pennies.